Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the containerV2 flag to compute initial heap percentage using processor count, and disable setting the ActiveProcessorCount JVM option #361

Merged

Conversation

mpritham
Copy link
Contributor

@mpritham mpritham commented Nov 13, 2023

Before this PR

This is part of a larger effort in improving stability of Java services in Rubix.

-XX:ActiveProcessorCount

k8s resource requests provide a minimum bound on resource usage. The -XX:ActiveProcessorCount JVM option acts as an upper bound on resource usage in several internal JDK features (e.g. GC threads, compiler threads). We currently set the active processor count based on resource requests. This has led to unpredictable performance under load.

Setting initial heap size to be aware of the number of processors

From the above change, we're no longer limiting the number of processors available to the JVM. In the past, when we've removed limits on the number of processors (due to JDK-8281181 being released in a critical patch update), we've encountered several OOM exceptions. This PR allocates 3mb per processor, by removing that much from the heap size. The rough 3mb figure was obtained by observation (cc: @carterkozak).

After this PR

==COMMIT_MSG==
When the experimental containerV2 is set, 1) compute the heap percentage as 75% of the heap minus 3mb per processor, with a minimum value of 50%, and 2) don't set the -XX:ActiveProcessorCount JVM option.
==COMMIT_MSG==

Possible downsides?

This change will potentially waste available memory for services which would benefit from larger heap sizes. Also, the heap size heuristic is simple; a more sophisticated solution would allow for better heap utilization (see #323).

Pritham Marupaka added 3 commits November 13, 2023 17:42
@changelog-app
Copy link

changelog-app bot commented Nov 13, 2023

Generate changelog in changelog/@unreleased

What do the change types mean?
  • feature: A new feature of the service.
  • improvement: An incremental improvement in the functionality or operation of the service.
  • fix: Remedies the incorrect behaviour of a component of the service in a backwards-compatible way.
  • break: Has the potential to break consumers of this service's API, inclusive of both Palantir services
    and external consumers of the service's API (e.g. customer-written software or integrations).
  • deprecation: Advertises the intention to remove service functionality without any change to the
    operation of the service itself.
  • manualTask: Requires the possibility of manual intervention (running a script, eyeballing configuration,
    performing database surgery, ...) at the time of upgrade for it to succeed.
  • migration: A fully automatic upgrade migration task with no engineer input required.

Note: only one type should be chosen.

How are new versions calculated?
  • ❗The break and manual task changelog types will result in a major release!
  • 🐛 The fix changelog type will result in a minor release in most cases, and a patch release version for patch branches. This behaviour is configurable in autorelease.
  • ✨ All others will result in a minor version release.

Type

  • Feature
  • Improvement
  • Fix
  • Break
  • Deprecation
  • Manual task
  • Migration

Description

When the experimental containerV2 is set, 1) compute the heap percentage as 75% of the heap minus 3mb per processor, with a minimum value of 50%, and 2) don't set the -XX:ActiveProcessorCount JVM option.

Check the box to generate changelog(s)

  • Generate changelog entry

@mpritham mpritham changed the title Add the UseProcessorAwareInitialHeapPercentage flag to compute initial heap percentage using processor count, and disable setting the ActiveProcessorCount JVM option Add the UseProcessorAwareInitialHeapSize flag to compute initial heap percentage using processor count, and disable setting the ActiveProcessorCount JVM option Nov 13, 2023
@mpritham mpritham marked this pull request as ready for review November 13, 2023 23:22
@mpritham mpritham self-assigned this Nov 13, 2023
launchlib/launcher.go Outdated Show resolved Hide resolved
@mpritham mpritham marked this pull request as draft November 13, 2023 23:27
@mpritham
Copy link
Contributor Author

Converting back to draft. I'd like to add a test.

launchlib/config.go Outdated Show resolved Hide resolved
Pritham Marupaka added 2 commits November 14, 2023 10:47
…ct' of github.com:palantir/go-java-launcher into pmarupaka/updateActiveProcessorCountFlagAndInitialHeapPct
@mpritham mpritham marked this pull request as ready for review November 14, 2023 15:48
launchlib/launcher.go Outdated Show resolved Hide resolved
launchlib/launcher.go Outdated Show resolved Hide resolved
@mpritham mpritham marked this pull request as draft November 14, 2023 16:58
@mpritham mpritham marked this pull request as ready for review November 14, 2023 19:18
launchlib/config.go Outdated Show resolved Hide resolved
…itial/max ram percentage when memory.limit_in_bytes is not present
@mpritham mpritham changed the title Add the experimentalContainerV2 flag to compute initial heap percentage using processor count, and disable setting the ActiveProcessorCount JVM option Add the containerV2 flag to compute initial heap percentage using processor count, and disable setting the ActiveProcessorCount JVM option Nov 16, 2023
launchlib/launcher.go Outdated Show resolved Hide resolved
launchlib/launcher.go Outdated Show resolved Hide resolved
README.md Outdated
``-XX:MaxRAMPercentage=`` nor ``-XX:InitialRAMPercentage=`` prefixes are present in either static or custom jvm opts,
``-Xmx|-Xms`` will both be set to be 75% of the heap size minus 3mb per processor, with a minimum value of 50% of the
heap.
1. If neither ``-XX:MaxRAMPercentage=`` nor ``-XX:InitialRAMPercentage=`` prefixes are present in either static or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to clarify that this case only applies when cgroup memory limits aren't detected

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may also be fair to exclude the fallback info from the documentation as it shouldn't be reachable in rubix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated it to specify that the fallback is used when we're not able to get the cgroup memory limit.

launchlib/launcher.go Outdated Show resolved Hide resolved
Copy link
Contributor

@carterkozak carterkozak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@mpritham
Copy link
Contributor Author

👍

@bulldozer-bot bulldozer-bot bot merged commit b1a206c into develop Nov 16, 2023
@bulldozer-bot bulldozer-bot bot deleted the pmarupaka/updateActiveProcessorCountFlagAndInitialHeapPct branch November 16, 2023 17:18
@svc-autorelease
Copy link
Collaborator

Released 1.93.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants